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Abstract 

There exist efficient algorithms to project a point onto the intersection 
of a convex cone and an affine subspace. Tliose conic projections are in 
turn the work-horse of a range of algorithms in conic optimization, having 
a variety of applications in science, finance and engineering. This chapter 
reviews some of these algorithms, emphasizing the so-called regularization 
algorithms for linear conic optimization, and applications in polynomial 
optimization. This is a presentation of the material of several recent 
research articles; we aim here at clarifying the ideas, presenting them in 
a general framework, and pointing out important techniques. 



1 Introduction, motivations, outline 

1.0.1 Projection onto semidefinite positive matrices 

Consider the space 5„ of symmetric n-hy-n matrices, equipped with the norm 
associated to the usual inner product 

n 

{X,Y) = XijY.j = trace(X^y). 

The subset made of positive semidefinite matrices forms a closed convex cone 
of iS„. A general result for closed convex sets yields that we can project onto S^: 
given C G 5,1 , there exists an unique element of 5+ (called the projection of C 
onto and denoted by Projg+(C)) such that 

||Proj5+(C)-q| = min IIX-CII. 

It turns out that we also have an explicit expression of this projection, through 
the spectral decomposition of C. Consider indeed the decomposition 

C = C/Diag(Ai,...,A„)C/^ 
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where Ai > • • ■ > A„ are the eigenvalues of C and /7 is a corresponding or- 
thonormal matrix of eigenvectors of C ; then the projection of C onto 5+ is 

Proj5+(C) = C/Diag(max(0,Ai),...,max(0,A„))C/^. (1) 

This result was noticed early by statisticians |SA79) (see also |Hig88| ), and 
since then this projection has been widely used. We notice that this result 
generalizes nicely for "spectral sets" ; see jLM 08; . Note also that the numerical 
cost of computing this projection is essentially that of computing the spectral 
decomposition of C, the matrix to project. 

The developments of this chapter show that more sophisticated projections 
onto subsets of are also computable using standard tools of numerical op- 
timization. More specifically, the subsets that we consider are intersections of 
the cone with a polyhedron (defined as affine equalities and inequalities). 
Though the projection onto those intersections is not explicit anymore, we still 
have efficient algorithms to compute them, even for large-scale problems. 

1.0.2 Projection onto correlation matrices 

The most famous example of such projections is the projection onto the set of 
correlation matrices (that are the real symmetric positive semidefinite matrices 
with ones on the diagonal). It is common to be faced with a matrix that is 
supposed to be a correlation matrix but for a variety of reasons is not. For 
example, estimating correlation matrices when data come from different time 
frequencies may lead to a non-positive semidefinite matrix. Another example is 
stress-testing in finance: a practitioner may wish to explore the effect on a port- 
folio of assigning certain correlations differently from the historical estimates, 
but this operation can destroy the semidefiniteness of the matrix. 

These important practical questions have led to much interest in the prob- 
lem of computing the nearest correlation matrix to a given a matrix C (see 
e.g. |Hig02| , |Mal04j . [QSOe, and BH08 ). This problem is simply formulated as 
the projection of C onto correlation matrices 

min l\\X-Cf 

Xu^l, ^ = l,...,n (2) 

The methods reviewed in this chapter apply to solving this problem in particular. 
The point is that this problem (and variants of it) can now be solved efficiently 
(for sizes up to n = 5000; the only limitation on a standard computer is the 
memory constraints). 

1.0.3 Conic projection problem 

The general problem that we first focus on in this chapter is the following. In 
the space K" equipped with the standard inner product, we want to compute 
the projection of a point c S K" onto the intersection IC DV where 
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• /C is a closed convex cone in M" (that we further assume to have full 
dimension in R"; that is, int JC ^%), 

• V is 'A convex polyhedron defined by affinc (in)equalities 

V := {x R" : x = (or <) 5^, « 1, . . . , m] . 

We suppose moreover that the intersection /C n "P is nonempty, so that the pro- 
jection onto the closed convex set /CnP exists and is unique (see e.g. HUL93] V 

The fact that V is defined by both equalities and inequalities does not re- 
ally matter in our developments. To simplify presentation, one may take only 
equalities, so that V is an affine subspace. We prefer to keep the above loose 
notation with both equalities and inequalities, because it is closer to projection 
problems arising in practice, and because it does not impact the basics of pro- 
jection algorithms. Adding positive slack variables for the inequalities allows 
us to reformulate the problem as a projection onto the intersection of an affine 
space with a cone of the form JC x (K+)'"'. 

We note that in general one can project onto a polyhedron V. For the case 
when there are only (independent) equalities in the definition (i.e. if 7-" = ^ is 
an affine subspace of the equation Ax = b with a full-rank matrix A), we have 
the explicit expression of the projection of x 

Vro]j^{x)=x- A^[AA^]-\Ax^b). (3) 

For a general polyhedron V, we still can compute the projection Projp(a;) effi- 
ciently using quadratic programming solvers (see e.g. |NW99j or jBGLS03"] ). In 
this chapter, we make the practical assumption that it is also easy to project 
onto /C. Recall from above that we have an easy-to-compute expression of the 
projection for JC = it turns out to be also the case for the second-order cone 
(or Lorentz cone) 

£„ := {.T eK" : ||(a;i, . . . ,x„_i)|l <a:„}. 

Though it is easy to project onto V and also onto /C by assumption, the projec- 
tion onto the intersection V r\K. can still be challenging. The difficulty comes 
from the presence of both (affine and conic) constraints at the same time. We 
will see in Section [5] that many numerical methods to compute the projection 
onto the intersection use combinations of projections onto V and K, separately. 

The geometrical projection problem has an obvious analytical formulation 
as a least-squares problem, namely minimizing the (squared) norm subject to 
the conic constraints and the affine constraints: 

\ xeVr\K.. ^ ' 

For example, an important subclass of such problems are semidefinite least- 
squares problems (i.e. when /C = S^)'- 

min \\\X-Cf 

{A,,X) = {or<)h, i = l,...,m (5) 
X ^ 0, 
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for C,Ai G Sn- For example, the nearest correlatfon matrix problem ^ is an 
instance of this later class. Notice finally that problems (|4]) and ([5]) coincide 

2 

formally with x € K" collecting the rows of X G Sn- This also explains the 
slight abuse of notation when writing JC = ■ We use this ambiguity x o 
X in particular in Section |4] to ease presentation of relaxations of polynomial 
optimization problems. 

1.0.4 Linear conic optimization problem 

The second development of this chapter is about a more standard topic: solving 
linear conic optimization problems. With the above notation, these problems 
can be expressed as 



As presented in this handbook and as well as in the first handbook |SVWOO| . 
linear conic programming has been a very active field of research spurred by 
many applications and by the development of efficient methods. 

In this chapter, we explain how conic projections can be used to develop a 
new family of algorithms for solving linear conic problems. In fact, the pro- 
jection problem ([4]) can be written as a linear problem of the form (j6|) and 
then can be solved using usual conic programming solvers (we come back to 
this at the beginning of Section [5] and we explain why it is not a good idea 
to do so). However we will also show the other way around: Section l3.ll ex- 
plains that one can also solve the linear conic problem ^ by solving projection 
problems ([3]), more precisely with a succession of (truncated) projection-like 
problems. So-called regularization methods are presented, discussed and illus- 
trated on solving semidefinite relaxations of combinatorial optimization and 
polynomial optimization problems having many constraints. 

1.0.5 Polynomial optimization 

Over the last decade, semidefinite programming has been used in polynomial 
optimization, namely for deciding whether a multivariate real polynomial is 
nonnegative, or, more generally, to minimize a polynomial on a semialgebraic 
set (a set described by polynomial inequalities and equations). A hierarchy of 
embedded linear semidefinite relaxations (of the form ^) can be constructed to 
generate a monotone sequence of bounds on the global minimum of a polynomial 
optimization problem. Asymptotic convergence of the sequence to the global 
minimum can be guaranteed under mild assumptions, and numerical linear al- 
gebra can be used to detect global optimality and extract global minimizers. 
The theory is surveyed in [LauOQj and |Las09) : the potential applications are 
numerous (see e.g. in control theory |HG05j or signal processing [Dum07| V 

Section [5] reports numerical experiments showing that regularization algo- 
rithms based on projections outperform classical primal-dual interior-point al- 
gorithms for solving semidefinite relaxations arising when deciding whether a 
polynomial is nonnegative, and for globally minimizing a polynomial. 




(6) 
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1.0.6 Objectives and outline of this chapter 

This chapter focuses on projection problems that have a simple geometric appeal 
as well as important applications in engineering. We give references to some of 
these applications, and we give emphasis on polynomial optimization. 

The first goal of this chapter is to sketch the different approaches to solve ^ . 
Section [2] is devoted to this review, with an emplasis on dual methods (in Sec- 
tion 12. 2p . The bottomline is that as soon as we can easily project onto K, (we 
have in mind and <S+ as well as direct products of these), we have efficient 
algorithms to project onto the intersection ICnV. 

The second goal of this chapter is to explain how to use these conic projec- 
tions to build a family of "regularization" methods for linear conic programming. 
The approach uses standard optimization techniques (proximal algorithms and 
augmented Lagrangian methods) and has been recently developed for the case 
/C = iS+ . Section [3] presents it in a general framework and underlines the role 
of conic projections. The final section presents some numerical experiments 
with regularization methods on polynomial optimization problems, showing the 
interest of the approach in that context. 

This chapter is meant to be an elementary presentation of parts of the ma- 
terial of several papers; among those, our main references are |Mal04j . |QS06| , 
IMPRW09] . [HMllj . |ZST10) and |Nie09| . We aim at clarifying the ideas, pre- 
senting them in a general framework, unifying notation, and most of all, pointing 
out what makes things work. To this purpose, we have to elude some technical 
points; in particular, we discuss algorithms, but we do not to give convergence 
results. We try to give precise references throughout the text on these lacking 
points. 

2 Conic projections: algorithms and applications 

This section reviews the methods for solving the conic projection problem ([4]), 
presenting them in chronological order. We sketch the main ideas and give 
references; we do not get into much details. Discussions about convergence 
issues and numerical comparisons are beyond the scope of this section. 

Beside interior-point methods, the basic idea of all the approaches is to some- 
how separate the two constraint-sets /C and V and to use the projections onto 
them successively: this is obvious for alternating projections and alternating 
directions methods; it is also the case for dual methods (we focus on this latter 
method in Section r2.2p . The point is that we can solve the conic projection 
problem (|4|) efficiently (by dual algorithms in particular). 

To simplify presentation, we stick here with the projection problem ([4]), but 
the approach generalizes in two directions. First, we could replace the cone /C by 
any closed convex set: in this more general case, the developments are similar, 
with slightly more complicated expressions of dual objects (a related work is 
|MU88| ). Second, we could consider problems with strongly convex quadratic 
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objective functions, such as 

J min {x — c)^ Q{x — c) + X 

\ xeicnr ^' 

with Q positive definite. Such problems can be phrased as projection problems 
with respect to ||x||q = \fxTQx the norm associated to Q. The practical 
technical assumption is then that one can project onto /C with respect to || • ||q 
(which is not easy in general). 

2.1 Computing conic projections 

2.1.1 Using linear conic programming 

A tempting method to solve (j4]) is to cast this projection problem as a usual 
linear conic programming problem, so that we can use the powerful tools de- 
veloped for this case. There are several ways to do so; a simple one consists in 
pushing down the objective function with an additional variable t: Q is indeed 
equivalent to linear conic program 

min t 

x<^r 

{ 

X ~ c = z 

{X, (z,t)) G /C X Cn+l 

where the variable z € M" is then introduced to express the additional second- 
order cone constraint appearing in the constraints. This problem can be readily 
given to usual conic solvers, for example interior-points methods, like SeDuMi 
|Stu99| or SDPT3 |TTT03| under Matlab. Unfortunately, adding {z,t) makes 
the computational cost and memory space needed by a standard primal-dual 
interior-point method increase, and numerical testing confirms that the method 
is not viable in general (as mentioned e.g. in |Hig02| , jToh08| ). 

We note furthermore that the projection problem (j4]) is a quadratic conic 
programming problem, hence a special case of nonlinear conic optimization prob- 
lems. We could solve (|4]) by algorithms and software devoted to nonlinear conic 
optimization problems such as the penalization method of |KS03) . However 
those methods would not use the special structure of (jl]), and as the above ap- 
proach by linear conic programming, they would be efficient only for small-size 
projection problems. The projection problems are important enough to design 
algorithms specifically to them, as presented in the sequel. Note that we are 
not aware of a tailored penalization algorithm for ([4]). 

2.1.2 Alternating projections 

The alternating projection method is an intuitive algorithmic scheme to find a 
point in the intersection of two sets: it consists in projecting the initial point 
onto the first set, then projecting the new point onto the second set, and then 
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projecting again the new point onto the first and keep on projecting alterna- 
tively. In other words, it consists in repeating; 

f Xk+i = Proj^(j/fc) 
1 Vk+i = Vvo]-p{xk+i) 

If the two sets have a "regular" intersection, this algorithm converges linearly 
to a point in P fl /C and we know the speed of convergence (for two convex sets, 
see e.g. |Deu01] : for the general case, see the local result of |LLM09] ). 

We can modify this simple alternating projection scheme by adding a cor- 
rection step (called Dykstra's correction [Dyk83| ) at each iteration (|8]) 

Xfc+i = Proj;c(^fc) 

Vk+i = Projp(a;fe+i) (9) 
Zk+i ^ — (ajfe+i — Uk+i)- 

This modification ensures the convergence of the sequence {xk)k to the projec- 
tion Vvo]f^fYp[c) - and not only to a point in the intersection K.C\'P . This ap- 
proach was proposed by |Hig02| for the nearest correlation matrix problem ([2]). 
It generalizes to (jlj since it is easy to project onto V and we assume that it 
is the same for JC. We will see that dual methods and alternating direction 
methods can be interpreted as variants of this basic geometrical method. 

2.1.3 Dual methods 

The conic programming problem ((4]) looks more complicated than a usual conic 
programming problem with linear function instead of a norm as objective func- 
tion. It turns out that the strong convexity of the objective function provides 
nice properties to the dual problem that can then be solved efhciently. 

The dual approach was proposed for the conic least-squares problem ^ in 
|Mal04] ■ later revisited by |BX05| for the case of /C = 5+, and then enhanced 
by |QS06| and [BH08| for the projection onto correlation matrices. In the next 
section, we give more details and more references about this approach. 

2.1.4 Interior points 

As a convex optimization problem, Q can be attacked with the interior-point 
machinery [NN94| . assuming that both the cone K, and its polar cone 

K° {s e M" : s^x < for all xeK] 

are equipped with so-called self-concordant barriers (as is the case for CmS^). 
The approach consists in solving perturbed optimality conditions of (jl]) . As any 
projection problem, notice that the optimality condition is 

x^rr\K, (c~x)^ {x-x) <Q, ior all X e V CMC. 
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To write down the optimality conditions more concretely, let us make explicit 
the affine constraints with the help of Ae e R"^'"^ and Ai e R"x™i as 

min — c|p 

Aex = 6e, Aix < bi (10) 
X G AC. 

Under a non-degeneracy assumption (e.g. Slater condition, see next section), 
the optimality conditions of (jlOp give the complementarity system 

X - C + U + Ae^U + Ai^ z 
Aex = &e, ye M™^ 
Aix <bi, ze Rl'\ z'^iAix - 6i) = 
X elC, u G /C°, u^x = 0. 

Roughly speaking, an interior-point approach consists in perturbing the com- 
plementary equations above and keeping other equations satisfied. (We will see 
that the forthcoming dual approach goes exactly the other way around.) A 
first interior-point method is proposed in |AHTW03] for the nearest correla- 
tion matrix problem Interior-point methods for general quadratic SDP are 
introduced and tested on projection problems in [TTTOGj and |Toh08) . 



2.1.5 Alternating directions 

The alternating direction method is a standard method in variational analy- 
sis (see e.g. |GM76j ). going back to [DR56j . This method was proposed by 
|HXY09j for solving the semidefinite projection problem ([5|) and by |SZ10j for 
more general quadratically constrained quadratic SDP. The idea of the method 
is to exploit the separable structure of the problem, as follows. Let us duplicate 
the variables to write the equivalent problem 

min i||a;-c||2-f i||y-c||2 

x^y (11) 

xelC, yeV. 

The alternating direction method applied to ([TT|) gives the following scheme: 
consider the augmented Lagrangian function 

L{x,y;z) ^ - cf + ^\\y- cf -{z,x-y) + ^\\x ~ yf; 

the minimization of L with respect to primal variables (a;, y) is decomposed in 
two steps, so that an augmented Lagrangian iteration is 

xu+\ = argmin^g^ L(x, yk, Zk) 
yk+i = argmin^gp L{xk+i,y, Zk) 
Zk+i = Zk - I3{xk+i - 2/fc+i)- 
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It is not difficult to prove that the two above minimizations boil down to pro- 
jections, more specifically 

Ti • (fiyk + Zk + c\ . fl3xk+i-Zk + 
x.+i = Proj^i j, y^+i = Proj^( 

Thus the approach alternates projections onto V and JC to compute the projec- 
tion onto ICnV; it can thus be seen as a modification of the simple alternating 
projection scheme ([8]), with the same fiavour as Dykstra modification ([9]). 

2.2 More on dual approach 
2.2.1 Apply standard machinery 

Let us give more details about the dual approach for solving (jH). Following 
|Mal04] , we apply the standard mechanism of Lagrangian duality to this prob- 
lem; we refer to |HUL93[ Ch. XII] and |BV041 Ch. 5] for more on this mechanism 
in general. 

Let us consider the more explicit form (|10p . and denote also hy A := [Ae; Ai] 
and b := [feg; bi] the concatenation of the affine constraints. We dualize affine 
constraints only: introduce the Lagrangian, a function of primal variable a; G /C 
and dual variable {y, z) e M™^ x M™^ 

L{x; y, z) i ||c - - {Aex - 6e) - z'^ {Aix - bi), (12) 

and the corresponding concave dual function 

0{y,z) :^ mm L{x;y,z), (13) 

which is to be maximized. There is no more affine constraint in the above 
minimum, and it is easy to prove ( |Mal04( Th.3.1]) that the problem corresponds 
to a projection onto /C: there exists a unique point which reaches the above 
minimum, namely 

x{y, z) := ProjK(c + A^J y + A^ z), (14) 

so we have 

0(y, z) = bE^y + b^^z + \{\\cf - ||x(y, z)f). (15) 

It is also not difficult to show |Mal041 Th.3.2] that the concave function Q is 
differentiable on M™, and that its gradient 

^B{y,z) = -Ax{y,z) + b (16) 

is Lipschitz continuous. As any function with Lipschitz gradient, Q is twice 
differentiable almost everywhere, but not everywhere (this basically relies on 
the differentiability properties of Proj^; for the case K = 5,^, see more in 
[SS02] and |MS06j among others). 
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The dual problem is thus 



{ 



max 9{y, z) 
(y, z) e X M: 



,mi 



(17) 



Strong duality (the optimal values of ([TU]) and PTl) coincide) holds under a 
standard assumption in convex optimization. The so-called (weak) Slater as- 
sumption (see e.g. |Ber9 5l. |H UL93 I) is in our context: 




In fact, this assumption yields moreover that there exists solutions to p7|) (note 
that the assumption has also a natural geometrical appeal in context of projec- 
tion methods, see |HM111 Sec. 3]). Finally we get directly the projection from 
dual solutions: let {y*,z*) be a (dual) solution of ([T7)) . the (primal) solution x* 
of (HI) is the associated x* = x{y*,z*) (see |Mal04[ Th.4.1]). 

2.2.2 Apply standard algorithms 

To compute the projection of c onto V fMC, we just have to solve the dual 
problem p7)) . Let us have a closer look to this problem: the constraints are 
simple positivity constraints on the variable corresponding to the dualization 
of inequality constraints; the dual function is a differentiable concave function 
with Lipschitz gradient. This regularity has a huge impact in practice: it opens 
the way for using standard algorithms for nonlinear optimization. Hence we 
can use any of the following numerical methods to solve ((T7)) (as soon as the 
software can deal with the constraints Zi > 0): 

1. gradient methods: standard methods |Ber95] or more evolved ones, as 
e.g. Nesterov's method INesOS] : 

2. Newton-like methods: quasi-Newton, limited memory quasi-Newton, in- 
exact Newton, Newton-CG, see textbooks |NW99| and |BGLSQ3| - with 
the restriction that 9 is not twice differentiable everywhere, so that we 
have to use the so-called semismooth Newton methods, see |QS93| . 

For example, |Mal04] uses a quasi-Newton method for solving ([S]), and |QS06| 
uses a semismooth inexact Newton method for solving We come back on 
these two methods in the next section to give more practical details. 

We also mention here the so-called inexact smoothing method of |GS09j 
which consists in writing the optimality conditions of the dual problem (jl7p 
as a nonsmooth fixed point problem (and solving it by combining smoothing 
techniques and an inexact Newton method; see e.g. |NW99) ) . 

The dual problem ([TTl) can thus be attacked with classical tools or more 
evolved techniques. In practice, the choice of the solving method depends on 
the structure of the problem and the target level of sophistication. 

We call dual projection methods any method using an optimization code for 
functions with Lipschitz gradient to maximize 9 on M."'-'^ x R'^'^ Specifically, a 




BxeVDmt IC. 



(18) 
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dual projection method generates a maximizing dual sequence {yk, Zfcjfc together 
with the primal sequence Xk = x{yk, Zk) such that: 

%fc,^fc) = bE^Vk + bi^Zk + ^iWcf -\\xk\\^) (19) 
VO{yk,Zk) = -Axk+b. (20) 

We notice that in our numerical experiments with dual methods, we have 
observed better behaviour and convergence when the (strong) Slater assumption 
holds (that is, when (|18l) holds and moreover is full rank). 

2.2.3 More algorithmic details (for the case without inequalities) 

We detail now further some algorithmic issues. To simplify we focus on the case 
without inequalities (toj — 0, no dual variables z). Iterations of most algorithms 
for maximizing 6 can be written as 

Vk+i =yk + TkWkVe{yk). (21) 

Note that the usual stopping test of these methods has an intrinsic meaning: a 
threshold condition on the gradient 

\\V9{yk)\\^\\Axk~b\\<e (22) 

controls in fact the primal infeasibility. Among these methods, let us discuss 
further the three following ones. 

Gradient descent with constant step-size. We have a remarkable result: 
the gradient method in an adapted metric, namely (j2ip with 

Wk = [AA^]-^ and = 1, (23) 

corresponds exactly to the alternating projection method © (see [Mal04] for 
a proof in the special case of correlation matrices, and |HM11) for the proof 
in general) . We thus have a (surprizing) dual interpretation of the primal pro- 
jection method. Using descent schemes more evolved than a simple gradient 
descent (see below) then leads to (dual) projection methods that can be seen as 
improvements of the basic alternating projection method. 

BFGS Quasi-Newton method. The method is known to be very efficient 
in general, and have many industrial applications (one of the most striking is 
in weather forecasting [GL89| ). The method can be readily applied to the dual 
problem, since it requires no more information than (|19|): Wk is constructed with 
successive gradients with the BFGS formula and is well-chosen with a Wolfe 
line-search (see e.g. |BGLS03] ). The initial paper about dual methods |Mal04j 
proposes to use this method in general and reports very good numerical results 
on the nearest correlation matrix problem ([2]) . Since then, this dual method has 
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been used successfully to solve real-life projection problems in numerical finance 
(among them: the problem of calibrating covariance matrices in robust portfolio 
selection jMal041 5.4]). A simple Matlab implementation has been made publicly 
available together with |HM11] for pedagogical and diffusion purposes. 

Generalized (or semismooth) Newton. A pure Newton method would be 
to use Tfc = 1 and Wk = [Hf.]^^ with the Hessian Hk of 9 at the current iterate 
yk ■ In practice, an inexact generalized Newton method is used for the following 
reasons. 

As mentioned earlier, 9 is differentiable but not twice differentiable (though 
its gradient is Lipschitz continuous). We can still replace the usual Hessian 
by a matrix Hk € d^9{yk) the Clarke generalized Hessian of 9 at yk [Cla83| . 
Computing a matrix in d'^9{yk) C 5+ amounts to computing an element of the 
Clarke generalized Jacobian of the projection onto the cone dcP'coix: since we 
have (see |HUSN84n 

d!9{yk) = A9,Proj^(c + ^V)^^- 

We can often compute an element of dcProij^. For example, we even have an 
explicit expression of the whole dc Proj^+ |MS06| . 

For overall efficiency of the method, the Newton direction dk is computed by 
solving the system Hkd = 'V9{yk) approximately, usually by conjugate gradient 
(CG) type methods. More precisely, the idea of so-called Newton-CG (also 
called inexact Newton going back to |DET82| ) is to stop the inner iteration of 
CG when 

\\Hkd + V9{Xk)\\<Vk\\V0{Xk)\\ (24) 

with small rjk (see e.g. jNW99| ) . Note that preconditioning the Newton system 
is then crucial for practical efficiency. The nice feature of this algorithm is that 
Hk has just to be known through products Hkd so that large-scale problems 
can be handled. In our context, the main work on this method is QS06, about 
the nearest correlation matrix problem; we come back to it in the next section. 

We finish here with a couple of words about convergence of this Newton dual 
method. In general (see |QS93, ), the two conditions to prove local superlinear 
convergence are that the minimum is strong (i.e. all elements of the generalized 
Hessian are positive definite), and the function has some smoothness (namely, 
the so-called semismoothness). In our situation, the two ingredients implying 
those conditions are the following ones: 

• The intersection has some "nondegeneracy" , in the sense of |BSOO[ 4.172] 
and |AH097i Def. 5]. This allows us to prove d^9{yk) >- (see e.g. [Q506] 
for a special case). 

• The convex cone /C has some "regularity" . An example of sufficient reg- 
ularity is that K. is a semialgebraic set (i.e. defined by a finite number of 
polynomial (in)equalities). Indeed for semialgebraic convex sets, the pro- 
jection Proj^i; and then 9 are automatically semismooth [BDLOSI (which 
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is the property needed to apply the convergence results of |QS93| . This is 
the case for direct products of the cones £„ and 5+ (for which we even 
have strong semismoothness |SS02) so in fact quadratic convergence). 

2.2.4 Illustration on nearest correlation matrix problem 

We give a rough idea of the efficiency of the dual approach on the projection 
problem ([2]). The first numerical results of ,Mal04. Sec. 4] show that the dual 
approach copes with large-scale problems, in reporting that one can solve in a 
couple of minutes projection problems of size around one thousand. By using 
the dual generalized Newton method (instead of quasi-Newton as in [Mal04] ) . 
the algorithm of |QS06| , improved later by |BH08| , gives very nice results in both 
practice and theory. Nondegeneracy of the constraints and then of the gener- 
alized Hessian is proved in [QSOG, Prop. 3.6]: as recalled above, this technical 
point leads to quadratic convergence of the method |QS06[ Prop. 5.3]. 

Today's state of the art is that one can solve nearest correlation matrix 
problems of big size (say, up to 4000-5000) in a reasonable amount of computing 
time (say, less than 10 minutes on a standard personal computer). The only 
limitation seems to be the memory constraint to store and deal with dense 
large-scale data. 

To give a more precise idea, let us report a couple of results from |BH08] . 
The implementation of their dual algorithm is in Matlab with some external 
Fortran subroutines (for eigenvalues decomposition in particular). The stopping 
criterion is set to 

||V%fc)|| < 10-V (25) 

We consider the nearest correlation matrix problems for two (non-SDP) matrices 
with unit diagonal (of size ni = 1399 and n2 — 3120) provided by a fund 
management company. The dual method solves them in around 2 and 15 min., 
respectively, on a very standard machine (see more details in |BH08j ). 

We finish with a last remark about accuracy. The approximate correlation 
matrix X that is computed by such a dual method is often just what is needed in 
practice. It might happen though that a special application requires a perfect 
correlation matrix - that is, with exactly ones on the diagonal, whereas X 
satisfies only (by (^51 ) 

i=l 

A simple post-treatment corrects this. Setting diagonal elements to ones may 
destroys the positiveness, so we apply the usual transformation that computes 
the associated correlation matrix X from a covariance matrix X, namely 

^^^-1/2^^-1/2 D = diag(X). 

This operation increases the distance from C ; but the error is still under control 
(by e/(l - e); see (BHOSI Prop. 3.2]). 
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2.3 Discussion: applications, generalizations 
2.3.1 Direct or indirect applications 

Conic projection problems with the positive semidefinite cone (like K, — 5+ , /C = 
<S+ X (IR+)P or /C = 5+ X • • • X 5+^) are numerous in engineering. Constructing 
structured semidefinite matrices, for example, are naturally modeled this way. 
Such problems naturally appear in finance for constructing structured covariance 
matrices (as a calibration step before simulations); they also appear in many 
other fields, such as in control (e.g. jLJ09j ). in numerical algebra (e.g. |AH07j ). 
or in optics (e.g. |NWV08| ), to name a few of them. 

Conic projections also appear as inner subproblems within more involved 
optimization problems. Solving efficiently these inner problems is often the key 
to numerical efficiency of the overall approach. Let us give some examples. 

• Linear conic programming. So-called regularization methods for solv- 
ing ([6]) use the conic projection problem as an inner subproblem; these 
methods are studied in Section [31 

• Weighted projections. For given weights Hij > 0, consider the semidefinite 
projection (O with a different objective function 

r min iE-,=i^^.(^..-C..)' 

< {A„X)^{or<)b,, i = l,...,m 

[ XhO- 

An augmented Lagrangian approach for this problem |QS10| produces a 
projection-like inner problem, which is solved by a semismooth Newton 
method (recall the discussion of the previous section). 

• Low-rank projections. Consider the semidefinite projection problem ([5]) 
with additional rank-constraint 

min i||X-C||2 

{A,X) = {oT<)b„ i = l,...,m (26) 
X to, rankX = r. 

This difficult non-convex calibration problem has several applications in 
finance and insurance industry (e.g. pricing interest rate derivatives for 
some models; see e.g. |BM06) ). Two approaches (by augmented Lagrangian 
[LQ10| and by penalty techniques |GS10| ) have been recently proposed 
to solve these types of problems; both approaches solve a sequence of 
projection-like subproblems. The numerical engine is a dual semismooth 
truncated Newton algorithm for computing projections. 

For these applications of conic projections, the techniques and the arguments 
are often the same, but are redeveloped for each particular projection problem 
encountered. We hope that the unified view of Section 2 can bring forth the 
common ground of these methods and to better understand how and why they 
work well. We finish this section by pointing out an easy geometrical application. 
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2.3.2 Application for solving conic feasibility problems 

The conic feasibility problem consists simply in finding a point x in the inter- 
section /C n P. Many engineering problems can be formulated as semidefinite 
or conic feasibility problems (for example in robust control jBGFB94] where an 
element in the intersection is a certificate of stability of solutions of differential 
equations). Section [4.21 focuses on semidefinite feasibility problems arising when 
testing positivity of polynomials. We refer to the introduction of [ HMllj for 
more examples and references. 

A simple and natural technique for solving conic feasibility problems is just 
to project a (well-chosen) point onto the intersection JC HP (by dual projec- 
tion methods for example). In [HMllj . a comparative study of such a conic 
projection method with the usual approach using SeDuMi was carried out pre- 
cisely on polynomial problems. It was shown there that an elementary Matlab 
implementation can be competitive with a sophisticated primal-dual interior- 
point implementation. This would even have a better performance if an initial 
heuristic for finding a good point to project could be determined (the numerical 
experiments of [HMlll Sec. 6] simply use c = 0). An answer to this latter point 
is provided by the regularization methods of the next section. 

3 Projections in regularization methods 

We focus in this section on standard linear conic programming. We show that, 
following classical convex optimization techniques, conic projections can be used 
to solve linear conic programming problems. 

There exist many numerical methods for solving linear conic problem ^ 
(see the first handbook [S VWOO] ) . But on the other hand, there also exist big 
conic problems, and especially big SDP problems, that make all the standard 
methods fail. Relaxations of combinatorial optimization problems and polyno- 
mial optimization problems yield indeed challenging problems. This motivates 
the development of new algorithmic schemes. 

The strategy that we present in this section exploits the efficiency of projec- 
tion methods by developing proximal algorithms for linear conic programming. 
We generalize the developments of |MPRW09] , and give all way long references 
to related works. As for numerical aspects, the target problems are semidefi- 
nite programs with the number of constraints possibly very large (more than 
100,000). 

3.1 Proximal method for linear conic programming 

3.1.1 Apply classical techniques of convex optimization 

The proximal algorithm is a classical method of convex optimization and vari- 
ational analysis: it goes back from the 1970s with premises in |BKL66| . the 
first work |Mar70| and the important reference jRoc76b| . The driving idea of 
the proximal algorithm is to add quadratic terms to the objective function to 
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"regularize" the problem (ensuring existence, uniqueness, and stability of solu- 
tions) . A (primal) proximal method of the linear conic problem (|6]) goes along 
the following lines. 

Consider the problem with respect to {x,p) 

niin c^x + 5j: llx — p|p 

p e M", X eVnJC. 

By minimizing first with respect to p, we see that this problem is equivalent to 
the primal linear conic problem (jH]). We have added to the objective function a 
quadratic "regularizing" term ||a; — with the so-called "prox-parameter" t. 
The idea now is to solve this problem in two steps: first with respect to x, and 
second to p: 

min / min x + ^\\x - pW^ \ , . 

peM" xevnic )' ^ ' 

The outer problem is thus the minimization with respect to p of the function 

which is the result of the inner optimization problem parametrized by p. As 
such defined, F is the so-called Moreau-Yosida regularization of the function 
x — ?• X + i-pnicix) the linear objective function plus the indicator function of 
the intersection (see e.g. |HUL931 Ch.XV.4]). 

The connection with the previous developments of this chapter is then obvi- 
ous: the above inner problem is essentially a projection problem as studied in the 
previous section (see ©). The solution of the inner problem (the "projection") 
is called the proximal point and denoted 

„ , . f argmin c^x + i^\\x—pP 

Note that, for simplicity, the dependence of F and Prox with respect to t is 
dropped in notation. 



3.1.2 Primal proximal algorithm for conic programming 

Applying basic convex analysis properties, it is easy to prove (see e.g. |HUL93[ 
Ch.XV.4]) that the Moreau-Yosida regularization F is convex and differentiable 
with gradient S7F{p) = (p — Prox(p))/t. The optimality condition of the uncon- 
strained minimization of F is then simply 

p = Prox(p). (29) 

Moreover a fixed-point p of the Prox operator is also a solution of the initial lin- 
ear conic problem ([6]) : observe indeed that p is feasible and reaches the optimal 
value, since 

val ® = mini^(p) = F{p) = c^p. (30) 
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A (primal) proximal algorithm for solving ([SJ then consists of a fixed-point 
algorithm on (|29)) 

Pk+i = Prox(pfe). (31) 

Since computing Prox(pfc) corresponds to solving a projection problem, we can 
use any of the algorithmic schemes described in Section 12.11 to implement pil) 
inside of this proximal algorithm. We call the proximal method the outer al- 
gorithm, and the chosen projection algorithm, the inner algorithm. We study 
in Section [3] the family of proximal algorithms obtained when dual projection 
algorithms of Section [2?2] are used as inner algorithms. 

As we have an iterative optimization algorithm (inner algorithm) inside of 
another iterative algorithm (outer algorithm) , the question of the stopping tests 
of the inner algorithm is obviously crucial. For practical efficiency, the inner 
stopping test should somehow depend on some outer information; we come 
back later in detail to this important point. 

So in fact the iteration (1311) is not carried out exactly, and replaced instead 
by a looser implementable relation 

||pfe+i-Prox(pfe)|| <efe. (32) 

Whatever is the inner projection algorithm, we have the general global conver- 
gence of the method under the assumption that the sequence of errors goes 
rapidly to zero. 

Proposition 1 (Global convergence) Assume that there exist a solution to 
If (ife)fe is bounded away from and if the primal proximal algorithm generates 
a sequence {pk)k such that 

k 

then {pk)k converges to a solution p of (j6|). 

Proof: The result is straightforward from the general convergence result of 
proximal algorithms. As a consequence of (j33p and the existence of a solution 
to (|ni), the sequence (pk)k is bounded and we can apply jRoc76b[ Th.l]: {pk)k 
converges to a fixed-point to Prox which is a solution of ([6]) by (j30| . □ 



3.1.3 Dual point of view: augmented Lagrangian 

We give here some details about the dual interpretation of the above primal 
algorithmic approach. It is known indeed that a proximal method for a problem 
corresponds exactly to an augmented Lagrangian method on its dual; we detail 
this for our case. To simplify writing duals, we abandon the general formulation 
([6]), and we suppose that there is no afhne inequalities (or that there are in- 
corporated with slack variables in K,). So we work from now with the standard 
form of primal and dual linear conic problems 

min c^ X ( max b^y 

Ax = b and < A'^y - w - c = (34) 

xeIC [ w e /C°. 
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Augmented Lagrangian methods are important classical regularization tech- 
niques in convex optimization (see |PT72] . |Roc76aj for important earlier ref- 
erences, and |HUL931 Chap. XII] for the connection with usual Lagrangian du- 
ality). In our situation, a dual augmented Lagrangian method goes along the 
following lines. Introduce the augmented Lagrangian function L with parameter 
< > 0, for the dual problem ((Ml) : 

L{y,u;p) := b'^y -p'^{A^y - u - c) - ^\\A^y cf . 

Note that this is just the usual Lagrangian for the problem 

r max b'^y - ^\\A^y -u~ cf , , 

\ A^y - u - c = 0, ueJC°, ^ ' 

that is the dual problem with an additional redundant quadratic term in the 
objective. The convex (bi)dual function is then defined as 

Q(j)) :— max L{y,u]p). (36) 

The bridge between the primal proximal method and the dual augmented La- 
grangian is set in the next proposition, formalizing a well-known result. 

Proposition 2 With notation above, we have &{p) = F{p) for p G MP' . 

Proof: Just apply |HUL93i XII. 5. 2. 3]: the augmented Lagrangian function 
Q{p) is the Moreau-Yosida of the usual dual function, which is here 

P + i{Ax=b}nK{p) ^ max b^ y - p^ {A^ y - u - c). 

This is exactly F{p) defined by ([28| (in the case when V is just the affine 
subspace of equation Ax ^ b).D 

The primal regularization by proximal approach and the dual augmented 
Lagrangian regularization thus correspond exactly to the same quadratic regu- 
larization process viewed either on the primal problem or on the dual (1341) . 

The developments of this section share similar properties with other aug- 
mented Lagrangian-type approaches for conic programming, among them: a 
primal augmented Lagrangian in |BV06| , a primal-dual augmented Lagrangian 
in |JR08| and a penalized augmented Lagrangian in |KS07) . 

3.2 Regularization methods for linear conic programming 

In this section we give more details on primal proximal algorithms (or dual 
augmented Lagrangian algorithms) that use dual projection methods as inner 
algorithms to carry out the proximal iteration p2|) . This family of algorithms 
is introduced for the case /C = 5+ in |MPRW09] . They are called regularization 
algorithms (rather than proximal algorithms, which would focus on the primal 
point of view only); we keep this terminology here. This section is more technical 
and could be skipped at a first reading. 

Regularization algorithms for conic programming specialize on three points: 
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1. the dual projection algorithm to compute Prox(a::fc), 

2. the rule to stop this inner algorithm, 

3. the rule to update the prox-parameter tk- 

The third point is an inherent difficulty of any practical implementation of prox- 
imal methods (e.g. bundle methods, see |CL93) V We are not aware of general 
techniques to tackle it. So we focus here on the first two points. 

3.2.1 Dual projection methods as inner algorithms 

We could use any dual projection algorithm of Section 12.21 to solve 

J min c^x + 5j ||x — 
[ Ax — b, X E IC. 

Embedded in a proximal scheme, a dual projection algorithm would lead to the 
forthcoming overall algorithm for solving linear conic problems p4|) . 

Note first that equations (fT4|) and (fTS]) for the projection-like problem ([37| 
become respectively 

x{y) = Proj^ {p + t{A^y^c)) (38) 

%) = 6^y+^(l|pf -Il^(2/)f)- (39) 

We use the (slightly loose) formulation (f2T|) of the iteration of dual projection 
methods to write a general regularization algorithm. We index the outer itera- 
tions by k and the inner ones by £. 

Algorithm 1 (Regularization methods) 

Outer loop on k stopped when ||j3fe+i ~Pfe|| small: 

Inner loop on £ stopped when \\Axe ~ b\\ small enough: 

Compute xi — Proj^(pfe + t^^i^A^ — c)) and gg — b ~ Axg 
Update ye+i — ye + ti We ge with appropriate re and We 
end (inner loop) 
Update Pk+i — xe (and tk) 
end (outer loop) 

We discuss several points about the above conceptual algorithm. 

• Memory. An important feature of regularization methods is the rather low 
memory requirement. The intrinsic operations of the algorithm are basi- 
cally the projection onto the cone and the multiplications by A and A^ . 
If the data has some structure, those multiplications could be performed 
efficiently (without constructing matrices). Moreover for maximizing 
(that is, essentially, implementing ye+i = Ue + TeWe ge), we could use 
algorithms of smooth unconstrained optimization adapted to large-scale 
problems and then requiring low-memory (as limited memory BGFS or 
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(37) 



Newton-CG, see e.g. |NW99) and IBGLSOSj ). We come back to this point 
later when discussing numerical issues. Roughly speaking, the point is 
the following: the computer memory can be used for storing problem data 
and the computation does not require much more extra memory. 

Inner restarting. At the outer iteration k, the inner projection algorithm 
can be initialized with the best yi of the previous iteration k — \. This has 
an intuitive appeal, so that in practice, i keeps increasing over the outer 
iterations. (Note also that the historical information on gradients may be 
carried out from iteration k — 1 to k as well.) 

Dual variable u. It is known that for any x € M", the projection onto the 
polar cone Proj^o(a;) is given by Proj^(a;) -\-Vm]jQa{x) = x (together with 
Proj^(a;)^Proj,c°(2;) = 0, see |HUL931 III.3.2.5]). When computing xt, 
we thus get automatically 

ut = Proj;c°(P*: + tk[A^Vi - c))/tk 

and it holds 

Pk + tk{A^ye- c) = tkui + xi. (40) 

Dual outer iterates. At the end of outer iteration k, we set (with a slight 
abuse of notation) yk+i = yi and Uk+i = ye for £ the final iteration of 
inner algorithm. Thus we have a sequence of primal-dual outer iterates 
{Pk,yk,Uk) G /C X R" X /C°. Under some technical assumptions, we can 
prove a convergence result of the same vein as Proposition [TJ any accumu- 
lation point of the sequence (pfc,j/fe,Ufc) is a primal-dual solution of (jlOp 
(see e.g. Theorem4.5 of |MPRW09j for a proof when K. = S+). 

Outer stopping test. We have already noticed in (j22p that the natural 
stopping test of dual projection algorithms controls primal infeasibility 
\\Axe — b\\. Interpreted as a fixed point iteration (j3ip . the natural stop- 
ping of the proximal algorithm is Hp/c+i — Pfe||; it turns out that this can 
interpreted as controlling dual infeasibility. Note indeed that (|40p yields 

Pk + tk{A^yk - c) = tkUk + Pk 

and then we have 

Ibfe-i-i -Pk\\ = tk\\A^yk -Uk- c||. 

Normal to interior-point methods. By construction, conic feasibility p JC 
(and u € K.°) and complementary x^u — are ensured throughout 
the algorithm, while primal-dual feasibilities are obtained asymptotically. 
In contrast, recall that basic interior-point methods maintain primal and 
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dual feasibility and the conic feasibility and work to reach complemen- 
tarity. Note also that, regularization algorithms give solutions that are 
usually on the boundary of the cone /C, since the primal iterates are con- 
structed by projections onto /C. In contrast again, basic interior-points 
give solutions as inside of the cone as possible. In a sense, regularization 
methods are then "normal" to interior point methods. 

• Overall stopping test. We have seen above that the natural outer and 
inner stopping rules of the regularization algorithm have a practical in- 
terpretation as dual and primal infeasibilities. Since complementary and 
conic feasibility are ensured by construction, the natural stopping test of 
the overall algorithm is 

max {\\Apk -b\\, \\A^yk - Uk - c\\} . (41) 

In practice, one should divide moreover the two infeasibilities by some 
constant quantities to get homogeneous ratios. 

3.2.2 Stopping inner iterations 

For which inner iteration £ can we set Pk+i = xg to proceed with the outer 
iteration? Since we have a loop inside of another, the rule to terminate the inner 
projection algorithm is indeed an important technical point for regularization 
algorithms. We discuss three strategies to set up inner stopping rules. 

Solving approximately the inner problem. The usual stopping inner rule 
in proximal methods is to stop inner iterations when the current inner iterate 
xi is close to the proximal point Prox(p/j). Doing this, the regularization algo- 
rithm approximates at best the conceptual proximal algorithm (which requires 
to solve the inner problem exactly) , so that we keep convergence properties (as 
in Proposition [1] for instance). 

This is the strategy followed by the regularization method of [ZSTIO] for 
K- = . This paper adopts the dual point of view (augmented Lagrangian) 
and uses semismooth Newton as dual projection algorithm for inner iterations 
(remember Section 12.21) . The regularization method thus combines the usual 
stopping strategy and an efficient inner algorithm (supporting large-scale prob- 
lems); it gives excellent numerical results on various semidefinite programming 
test-problems (see the last section of [ZSTlOj ). Under nondegeneracy assump- 
tions, we have moreover proofs of global and local convergences of the inner 
algorithm as well as the overall regularization method. 

Only one inner iteration; interpretation as saddle-point. An opposite 
strategy is to do only one inner iteration per outer iteration. This cautious 
strategy is motivated by the following remark. Let us come back to the proximal 
formulation ((27|) of the linear conic problem. Under Slater assumption ((T8)) . the 
inner projection problem can be replaced by its dual (recall (|15p . p7)) and 
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so that the primal and dual conic problems have the saddle-point 
formulation 

r min /max b'^y ^ ^^i\\pf - \\Proi^{p + t{A'^ y - c)f) 

With this point of view on the process, the choice of inner stopping conditions 
appears indeed to be crucial, because the inner and outer loops are antagonistic, 
as the first minimizes and the second maximizes. The idea of the "Arrow- 
Hurwicz" approach (see, for instance, |AHU59j ) is essentially to proceed with 
gradient-like iterations with respect to each variable successively. 

This is the strategy of the simple regularization method presented in [MPRWOQl 
Sec. 5.1]. Doing one inner iteration of (|2T|) with Wk = [AA^] and = 1/tk 
allows to simplify the regularization algorithm to just one loop with 

Pk+i = Projf^ipk + tk{A^ ye - c)) (with Ufc+i as by-product) (42) 
yk+i = yk + [AA'^]-\b~Apk)/tk. (43) 

We retrieve algorithm 5.1 of |MPRW09] by using the proof of proposition 3.4 
in there. This simple regularization algorithm has also an interpretation as an 
alternating direction method, see the chapter of this book devoted to them. 

In practice, it is important to note that AA'^ and its Cholesky factorization 
can be computed only once at the beginning of the algorithm. Even if this is an 
expensive task in general for problems that have many unstructured constraints 
(so that AA^ is big and unstructured), there exists some cases when AA^ 
is very simple, so that solving the system (|43|) is cheap. This is the case in 
particular when AA'^ is diagonal, as for SDP relaxations of max-cut problem, or 
fc-max-cut problem |GW95) . frequency assignment problems, see (BMZOli (5)], 
max-stable problem, see more below, and polynomial minimization problems, 
see forthcoming Section S) 

Something in-between. An attractive option is to find something in-between 
the previous two extreme strategies. Explicit rules for the management of in 
(|32p should be given, and for numerical efficiency they should be given online. 
Using off-line rules independent of the function is interesting in theory since it 
allows to get proof of linear convergence. Following usual paradigms of numeri- 
cal optimization, it would be possible to do better as soon as practical efficiency 
in concerned. An appropriate stopping rule still has to be found and studied; 
this is actually a general question and we are not aware of efficient techniques. 

Note finally that the practical implementation of the regularization method 
of [ZSTIO] does indeed something in-between: the simple scheme with one inner 
gradient iteration (second strategy) is used as a preprocessing phase before 
switching to making inner Newton iterations (first strategy). See more about 
this on numerical illustrations of the method in Section 2] A stopping test 
along the above lines would further enhance the numerical performance of the 
implementation. 
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3.2.3 Illustration: Computing Lovasz theta number 



We finish this section with a numerical illustration (borrowed from |MPRW09] 1 
of the performance of regularization methods on a classical combinatorial opti- 
mization problem. 

Lovasz [Lov79| proved the celebrated "sandwich" theorem in graph theory: 
the stable number a(G) of a graph G and the chromatic number x(G) of its 
complementary graph G are separated 

a{G) < 79(G) < x(G) 

by the optimal value of an SDP problem 

{max {lnxn,X) 
Xrj = 0, when (?;, j) is an edge of G (44) 
trace X = \, X ^ 0. 

As expected, it can be shown that this SDP problem is a formulation of the 
SDP relaxation of the max-stable problem. The stable number a{G) and the 
chromatic number x(G) are both NP-hard to compute and even hard to ap- 
proximate, so that the tractable 'i?(G) gives interesting information about G. 

Some graphs from the DIMACS collection [JT96| are very challenging in- 
stances for computing '&(G\ Those graphs have many edges and also many 
edges on the complementary, so that makes them the most difficult for standard 
methods (as noticed in [DR07| ). On the other hand, the structure of problem 
l|44p is very favorable for regularization methods and in particular for the simple 
one of MPRW09, Sec. 5.1] which is essentially (|12])-(|121). Observe indeed that 
the afSne constraints (|44p is given by "orthogonal" matrices (i.e. {^Aj^Ai) — 0), 
such that the matrix AA^ is diagonal. For illustration, the next table reports 
some of the bounds that were computed for the first time in [ MPRW09] . 



graph name 


n 


m 


AG) 


brock400-l 


400 


59723 


10.388 


kellerS 


776 


225990 


31.000 


brock800-l 


800 


207505 


19.233 


p-hat500-l 


500 


31569 


58.036 


p-hatlOOO-3 


1000 


371746 


18.23 



For more examples, see |MPRW09| Sec. 5.4] and }ZST10| Sec. 6.3]. 

4 Applications to polynomial optimization 

In this section, we illustrate the regularization methods for solving linear semidef- 
inite optimization problems in the context of polynomial optimization. We 
collect numerical experiments showing that regularization algorithms can be 
considered as an alternative to standard methods for deciding whether a poly- 
nomial is non-negative (Section 14.21) and for globally minimizing a polynomial 
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(Section 1331) • The point is thus the same as in |Nie09) which reports extensive 
numerical results using regularization methods for solving various large-scale 
polynomial optimization problems. 

We aim at giving here a methodology: our main focus is the generality of 
the approach and the reproducibility of experiments and results. We explain 
how to generate the test problems, we use public-domain implementations of the 
algorithms with default parameter tunings and with no attempt to adapt them 
to each particular problem instances (contrary to |Nie09) ). We do not carry 
out a comprehensive benchmarking with all methods and solvers; we just com- 
pare a widely used implementation of interior-point algorithm |Stu99j with two 
recent implementations of regularization methods (the basic one of [MPRW091 
Sec. 5.1], and the more sophisticated one of jZSTlOj V 

4.1 Sum-of-squares, SDP and software 

We briefly introduce in this section the notions and notation of polynomial 
optimization that we need. We refer to the recent surveys |Lau09j and [Las09) 
and to the other chapters of this book for more on this topic. 
Consider a multivariate polynomial of total degree 2d 

\a\<2d 

We use here the multi-index notation v" = v"^ ■ ■ ■ u^" where a £ runs over 
all nonnegative integer vectors of sum \a\ = ai + ■ ■ ■ + aN < 2(i. We say that 
p{v) is a sum-of-squares (SOS) of polynomials if one could find polynomials 
qk{v) such that 

p(z;) = ^gfe2(^)^ (4g) 

k 

It can be shown that finding such polynomials qk{v) amounts to a semidefinite 
feasibility problem. More specifically, if tt{v) denotes a vector of basis of poly- 
nomials of total degree less than or equal to d, finding an SOS decomposition 
(PS)) amounts to finding a so-called Gram matrix X G R"^" such that 

p{v) = 7r(u)^X7r(w) and X G 5+. (47) 

The set of SOS polynomials has thus a SDP representation of the form 

Ax^b, X eK. (48) 

where IC = A G M™^" is a linear operator depending only on the choice 
of basis 7r(ii), and b e M™ is a vector depending on p{v). For example if the 
vector of basis polynomials 7r(w) = [a;"]|Q|<d contains monomials x°' indexed by 
a G , then identifying powers of v in relation ([Tf]) yields 

Pa = {Aa,TT{v)n{v)^), for all a 
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where matrix Aa selects monomials x" in rank-one matrix Tr{v)'K{v) . More 
specifically, the entry in Aa with row index /3 and column index 7 is equal to 
one if /3 + 7 = a and zero otherwise. In problem (j48p . the row indexed by a in 
matrix A collects entries of matrix Aq, , and the row indexed by a in vector b is 
equal to Pa ■ Note that 

f N + d\ , f N + 2d\ 

n=(^ TV ) ^'''^ N j' 

so that the sizes of SDP problems grow quickly with the degree and the number 
of variables. 

The important remark is that this type of constraints are favorable to regu- 
larization methods: AA^ is always diagonal indeed. To see this, let a, /3 denote 
the row and column indices in matrix AA'^ . By construction, the entry (a, /3) 
in AA'^ is equal to (AajAjs): ii a — f3, this is equal to the number of non-zero 
entries in matrix Aa, otherwise, this is zero. Since it is important for numerical 
efficiency, we formalize the previous remark in a proposition. 

Proposition 3 (Orthogonality of constraints) Let A be the matrix in SOS 
semidefinite problem ^8^ - Then AA^ is diagonal with integer entries. 

Polynomial optimization problems that we consider in the next two sections 
are difficult to tackle directly but admit standard SOS relaxations involving 
constraints sets (|48p . In practice, an SOS relaxation approach boils down to 
solving linear semidefinite problems of the form 

J min c^x , , 

\ Ax^b, xeS+ 

where and c G R"^ b S R™, A e R™^"^, and vector x collects entries of 
matrix X £ R"^". For solving problem (|49|) . we use the three following the 
public-domain Matlab implementations: 

1. SeDuMil.3 implementing the primal-dual interior-point algorithm of jStu99) 
(available on sedumi . ie . lehigh . edu) 

2. MPRW a version of the basic regularization method of 'MPRW09{ Sec. 5.1] 
(available on www. math. uni-klu.ac.at/or/Software/mprw2.m) 

3. SDPNALO.l the regularization method of jZSTlO] 

(available on www.math.nus.edu.sg/~mattohkc/SDPNAL.html) 

Our goal here is just to show that the regularization methods are interesting 
in this context. We simply use default parameter tunings of the algorithms, 
with no attempt to adapt them to each particular problem instances contrary 
to in |Nie09| . With K.s=n, the calling sequences of the three Matlab functions 
SeDuMi, MPRW and SDPNAL for solving (gHJ are thus as follows: 
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pars = [] ; pars.tol = le-9; 
[x,y] = sedumi(A,b,c,K,pars) ; 
X = reshape(x,K.s,K.s) ; 

tol = le-9; C = reshape (c ,K . s ,K . s) ; 
[X,y] = mprw(A,b,C,le6,l,tol) ; 

opts = [] ; opts. tol = le-9; 
[blk,At,C,B] = read_sedumi(A,b,c,K) ; 
[obj,X,y] = sdpnal(blk,At,C,B,opts) ; 
X = X{1}; 

Experiments are carried out with Matlab 7.7 running on a Linux PC with 
Intel Xeon CPU W3520 2.67Ghz using 64 bit arithmetic and 8GB RAM. Com- 
putation times are given in seconds, with two significant digits only (since our 
objective is not a comprehensive accurate benchmarking of codes). 

Similarly to [NieOQj . we will see that, due to lower memory requirements, 
regularization methods can solve larger polynomial optimization problems than 
classical interior-point methods with the above setting. 

A last note about tolerance. The tolerance parameters tol for the three 
solvers are set to 10~® for all the numerical experiments (except otherwise 
stated). Notice though that the meaning of the tolerance parameter is not 
the same for two types of algorithms. With regularization methods, the relative 
accuracy measured in terms of primal and residuals (remember (|4ip) is easily 
controlled. We stress that lower requirements on the relative accuracy could 
result in a significant saving of computational time, and this could be useful 
when solving approximately large-scale problems with MPRW and SDPNAL 
(see some examples in |Nie09] ) . In contrast, we observe (as expected) that the 
iteration count of SeDuMi does not depend significantly on the expected accu- 
racy, measured in terms of duality gap. Most of the computational time is spent 
to generate an approximately feasible primal-dual pair with relatively small du- 
ality gap, and only a few more iterations are required to refine the accuracy 
below 10^^. 

4.2 Testing positivity of polynomials 

We focus in this section on the very first problem of polynomial optimization: 
we would like to know whether a polynomial (|45| is positive 

p{v) > 0, for all veR^. (50) 

In general this is a difficult problem for which no polynomial-time algorithm is 
known. It can be relaxed to the easier problem of testing if p could be expressed 
as an SOS Whenever it holds, then obviously condition ([50]) is satisfied. 

The converse is not true in general if iV > 2 and d > 3, and there are explicit 
counter-examples; the simplest of them (the Motzkin polynomial) is studied 
below. 
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4.2.1 Random full-rank polynomial SOS problems 

We consider random polynomial SOS problems which are constructed so that 
there is a full-rank orthogonal Gram matrix X (an interior point) solving prob- 
lem We use GloptiPoly 3 (see |Las09j ) to generate matrix A and vector b 
as follows: 

N = 5; 7, number of variables 

d = 3 ; 7o half degree 

mpol ( ' v' ,N, 1) ; 7, variables 

P = msdpdninC (v' *v) ~d) ; 7 construct A matrix 

[A,b,c,K] = msedumi(P); 7 retrieve A and K in SeDuMi format 

A = [c';-A]; 7 constant term and sign change 

c = zeros(size(A,2) ,1) ; 7 no objective function 

X = orth(randn(K . s) ) ; 7 random Gram matrix 

b = A*X(:); 7 corresponding right handside vector 

On Table [1] we report execution times (in seconds) required by SeDuMi, 
MPRW and SDPNAL to solve problem (|48|) for d = 3 (degree six polynomials) 
and = 5, . . . , 12. We also indicate the size n of matrix X and the number m 
of constraints (row dimension of matrix A) . We observe that SeDuMi is largely 
outperformed by MPRW and SDPNAL. We also observe that MPRW is about 
4 times slower than SDPNAL, but this is not surprising as MPRW is a sim- 
ple prototype (without comments and printing instructions it is about 50 lines 
of interpreted Matlab), whereas SDPNAL is a much more sophisticated pack- 
age heavily relying on the efficient data handling and numerical linear algebra 
routines of the SDPT3 package. We also recall that SDPNAL makes several 
iterations of MPRW as preprocessing. 



N 


n 


m 


SeDuMi 


MPRW 


SDPNAL 


5 


56 


462 


0.29 


0.03 


0.05 


6 


84 


924 


0.92 


0.05 


0.07 


7 


120 


1716 


4.8 


0.13 


0.10 


8 


165 


3003 


25 


0.35 


0.16 


9 


220 


5005 


110 


0.66 


0.25 


10 


286 


8008 


410 


1.3 


0.43 


11 


364 


12376 


1500 


3.0 


0.73 


12 


455 


18564 


> 3600 


5.0 


1.3 



Table 1: Comparative execution times for SOS problems. 



4.2.2 Random lov^^-rank polynomial SOS problems 

We consider random polynomial SOS problems which are constructed so that 
there is a rank-one Gram matrix X solving problem (j48]). For such problems, 
it is unlikely that there is an interior point x solving problem (|3S|), and indeed 
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SeDuMi does not find a full-rank solution. We use the same code as above, 
replacing the instruction X = orthCrandnCK . s) ) ; with the instructions 

X = orth(randn(K.s,l)) ; 
X = X*X' ; 

We report execution times (in seconds) in Table [51 In comparison with the 
problems with interior points of Table [TJ we observe that all the solvers experi- 
ence convergence issues. However, there is still a considerable improvement in 
terms of efficiency brought by regularization methods, though Slater's qualifi- 
cation constraint cannot be invoked to guarantee convergence. 



N 


n 


m 


SeDuMi 


MPRW 


SDPNAL 


5 


56 


462 


0.83 


0.20 


0.21 


6 


84 


924 


0.85 


0.32 


0.28 


7 


120 


1716 


16 


2.1 


0.51 


8 


165 


3003 


61 


4.8 


0.98 


9 


220 


5005 


330 


12 


1.2 


10 


286 


8008 


1300 


24 


2.5 


11 


364 


12376 


> 3600 


50 


3.5 


12 


455 


18564 


> 3600 


110 


6.6 



Table 2: Comparative execution times for low-rank SOS problems. 
4.2.3 Motzkin's polynomial 

We study a well-known bivariate {N — 2) polynomial of sixth degree {d = 
3) which is non-negative but cannot be written as a polynomial SOS, namely 
Motzkin's polynomial 

Poiv) ^ 1 + vjvjivf + vl ~ 3) 

see |Lau09) or |Las09] . This polynomial achieves its minimum zero at the four 
points = ±1, W2 = il- In a basis of monomials of degree up to 3 there is no 
Gram matrix X solving problem (^S)) . However, it was observed in |HL05) and 
later on shown theoretically in |Las06] that the perturbed polynomial 

Po{v) +epi{v) 

can be represented as a polynomial SOS (with full-rank Gram matrix) pro- 
vided the degree of the perturbation polynomial pi (v) is high enough, inversely 
proportional to scalar e > 0. In some sense, this can be interpreted as a reg- 
ularization procedure as in |HM11| . Practically speaking, since semidefinite 
programming solvers use inexact operations (floating point arithmetic) , it is not 
necessary to perturb explicitly the data. It is enough to choose a basis tt{v) of 
sufficiently high degree d > 3 in relation (|T7)) . and higher-order perturbations 
are automatically introduced by the algorithm. 

We use the following GloptiPoly3 instructions to generate data A, b for 
increasing values of d: 
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d = 8; 7. half degree 
mpol vl v2 

p = l+vl~2*v2~2*(vl~2+v2~2-3) ; 

P = msdp(min(p) ,d) ; 

[A,b,c,K,bO] = msedumi (P) ; 

A = [c ' ; -A] ; 

b = [-bO;-b] ; 

c = zeros(size(A,2) ,1) ; 

For this problem, we set tol=le-6 for the three solvers. When d — 3,4, 5, 6, 
SeDuMi takes less than 0.1 seconds to detect that problem p8)) is infeasible, 
and it provides a Farkas dual certificate vector y G — /C such that b^y — 1. 
When d = 7,8,9,10, SeDuMi takes less than 0.5 seconds to return a vector x 
such that the primal residual \\Ax — 6||2/||^||2 is less than 10~^ and the dual 
objective function b^y is less than 10~^ in absolute value. 

The behavior of SDPNAL and MPRW is more erratic, and convergence 
issues occur, as shown by the execution times (in seconds) of Table |3l For 
d = 3, 4, 5, MPRW stops after 10^ iterations, as there is no mechanism to detect 
infeasibility in this prototype software. 
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time 


\\Ax- 


^'112/ 








3 














4 














5 














6 


15 


6.20 


• 10- 


6 


1.12-10- 


6 


7 


25 


6.03 


• 10' 


7 


6.81 • 10- 


7 


8 


26 


5.80 


• 10- 


6 


-4.08 ■ 10 


-7 


9 


34 


1.01 


• 10- 


6 


-1.45- 10 


-7 


10 


75 


5.42 


■ 10- 


7 


-1.58-10 


-7 


d 


time 


\\Ax- 


bh/ 


\bh 


b'^y 




3 


5.1 


4.28 


■ 10- 


3 


33.4 




4 


9.2 


1.56 


• 10- 


4 


0.832 




5 


3.5 


4.59 


• 10- 


6 


4.37-10- 


5 


6 


4.6 


6.33 


• 10- 


6 


1.05-10- 


6 


7 


5.7 


8.95 


• 10- 


6 


3.86-10- 


7 


8 


5.9 


2.79 


• 10- 


6 


-3.46-10 


-7 


9 


7.9 


2.54 


• 10- 


6 


-3.25- 10 


-7 


10 


8.8 


1.88 


• 10- 


6 


-1.34- 10 


-7 



Table 3: Behavior of MPRW (left) and SDPNAL (right) for Motzkin's polyno- 
mial. 



4.2.4 Regularization vs projection 

Though it solves linear semideflnite problems, using regularization techniques 
somehow generalizes and enhances the idea [HMllj to using projection methods 
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directly for SOS feasibility problems. With this approach indeed, a question is 
to find a good point to project; taking systematically the zero matrix gives inter- 
esting results but could be greatly enhanced. Regularization methods provide a 
numerical solution to this: doing a sequence of truncated projections allows to 
keep the projection idea while getting rid of the question of the initial point to 
project. The behaviour of SDPNAL is interesting with this respect: it does first 
a preprocessing of several alternating direction iterations to get a meaningful 
point, then follows by projection-like iterations. In practice, we observe usually 
a very few iterations, and often one. For example, to decide whether the (admit- 
tedly trivial) polynomial p{v) = X^l^i '^l'^ is SOS, the SDP problem dimensions 
are n = 3003 and m = 184756, and after 90 seconds and only one projection-like 
iteration, SDPNAL provides a vector x satisfying \\Ax — fc||2/||^||2 ~ 1-4 ■ 10"^*^. 

4.3 Unconstrained polynomial minimization 

In this section we study global minimization problems 

p* — min p(v) (51) 

where p{v) is a given polynomial. For this problem, a semidefinite relaxation 
readily follows from the observation that 

p* = maxp p 

s.t. ~ p{v) - £ > 0, Vw e 

and by relaxing the above non-negativity constraint by the semidefinite pro- 
gramming constraint that polynomial p{v) — p is SOS, see [LauOQj and |Las09] . 



4.3.1 Random polynomial minimization problems 

We generate well-behaved instances of unconstrained polynomial minimization 
problems (j5ip with 

JV 

p{v)=po{v)+Y,^^'' 
1=1 

where poiv) is a random polynomial of total degree strictly less than 2d. The 
leading term J2i=i '^f'^ ensures coercivity of p(w) and hence existence of a global 
minimum in (|5ip. We use the following GloptiPoly 3 script to generate our 
examples: 

N = 10; 

mpolCv' ,N,1) ; 

b = mmon(v,0,2*d-l) ; °L degree up to 2d-l 
pO = randnd , length(b) ) ; pO = pO/norm(pO) ; 
p = pO*b + sum(ininon(v,d) . ~2) ; 
P = msdp(min(p) ) ; 
[A,b,c,K] = msedumi (P) ; 
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In Table m we report comparative execution times (in seconds) for d = 2 and 
various values of N, for solving the semidefinite relaxation. It turns out that 
for these generic problems, we observe that global optimality is always certified 
with a rank- one moment matrix |HL05| . Both MPRW and SDPNAL largely 
outperform SeDuMi on these examples. 



N 


n 


m 


ScDuMi 


MPRW 


SDPNAL 


5 


21 


126 


0.09 


0.05 


0.18 


6 


28 


209 


0.11 


0.07 


0.18 


7 


36 


329 


0.24 


0.12 


0.20 


8 


45 


494 


0.36 


0.19 


0.22 


9 


55 


714 


0.77 


0.28 


0.26 


10 


66 


1000 


1.9 


0.45 


0.29 


11 


78 


1364 


5.0 


0.78 


0.36 


12 


91 


1819 


11 


1.1 


0.41 


13 


105 


2379 


20 


1.6 


0.47 


14 


120 


3059 


42 


2.3 


0.65 


15 


136 


3875 


74 


3.0 


0.68 



Table 4: Comparative execution times for semidefinite relaxations of random 
polynomial minimization problems. 

4.3.2 A structured example 

Consider the problem studied in |Nie09[ Example 3.5], that is ([51]) with 
N / i \^ ( ^ 

We solve the semidefinite relaxation for increasing values of N. We collect 
comparative execution times on Table [5] for this example. For example, when 
N = 10, SDPNAL resp. MPRW returns a point x such that \\Ax - b\\/\\b\\ is 
equal to 1.4 • 10~^ resp. 2.6 • 10~^° and the minimum eigenvalue of X is equal 
to zero to machine precision. 

We observe again a considerable improvement in terms of performance brought 
by regularization methods in comparison with a classical interior-point method. 
For larger instances, most of the computation time of SeDuMi is spent for mem- 
ory swapping when constructing and handling large matrices. We refer to the 
recent extensive numerical work of [Nie09| for various structured problems. 

Acknowledgements 

The work of the first author was partly supported by Research Programme 
MSM6840770038 of the Czech Ministry of Education and by Project 103/10/0628 
of the Grant Agency of the Czech Republic. 




31 



N 


n 


m 


ScDuMi 


MPRW 


SDPNAL 


5 


56 


461 


0.50 


0.54 


0.60 


6 


84 


923 


2.5 


1.2 


1.2 


7 


120 


1715 


14 


5.0 


2.6 


8 


165 


3002 


92 


19 


6.7 


9 


220 


5004 


410 


65 


22 


10 


286 


8007 


1800 


200 


71 


11 


364 


12375 


7162 


490 


150 


12 


455 


18563 


> 7200 


1500 


530 


13 


560 


27131 


> 7200 


3500 


2300 


14 


680 


38760 


> 7200 


> 7200 


9900 



Table 5: Comparative execution times for semidefinite relaxations of a larger 
polynomial minimization problem. 
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